!pr3
Fast Garbage Collection................Col. Paul Shetler, MD
                                            Honolulu, Hawaii

When Applesoft programs manipulate strings, memory gradually fills up with little bits and pieces of old strings.  Eventually this space needs to be recovered so the program can continue.  The process of searching through all the still active strings, moving them back to the top of free memory, and making the remaining space available again is called "garbage collection".

Applesoft will automatically collect garbage when memory fills up.  However, the garbage collector in the Applesoft ROMs is pitifully slow.  Worse yet, the time to collect is proportional to the square of the number of strings in use.  That is, if you have 100 active strings it will take four times as long to collect garbage as if you had only 50 active strings.

Cornelis Bongers, of Erasmus University in Rotterdam, Netherlands, published a brilliant Garbage Collector for Applesoft strings in Micro, August 1982.  The speed of his program, when compared to the one residing in ROM, is incredible.  And the time is directly proportional to the number of strings, rather than the square of the number of strings.  The only problem with his program is that it belongs to the magazine that published it.  Or worse yet, it is tied to a program called Ampersoft, marketed by Microsparc (publishers of Nibble magazine) for $50.  When I asked them about a license, they wanted big bucks.

So, I decided to write my own garbage collector, based on the ideas behind Cornelis Bongers' program.  And then I further decided to make it available to all readers of Apple Assembly Line, where I myself have received so much help.

There are several catches.  Normal Applesoft programs save all string data with the high-order bit of each byte zero (positive ASCII).  Further, normal Applesoft programs never allow more than one string variable to point to the same exact memory copy of the string.  The method of garbage collection my program uses (Bongers' method) DEPENDS on these constraints.  If either is not true, LOOK OUT!  Of course, if your Applesoft programs are normal, you need have no fear.  Only if you are doing exotic things with your own machine language appendages to Applesoft might these constraints be violated.

The basic concept is fairly simple. Applesoft uses descriptors to point to the string in the string pool.  The descriptor consists of three bytes -- the length, and the address of the characters in the string pool.

Strings build down from the top of memory (HIMEM) and the descriptors build up from the end of the program in the variable space.  Since a new value assigned to a string is added to the bottom of the string pool, the pool is soon full of "garbage".

Applesoft frees the garbage one string at a time.  This n-square method takes forever, when there are large string arrays.  Bongers introduced the idea of marking active strings in the pool by setting the third byte in the string to a negative ASCII value, then storing the location of the descriptor in the first two bytes.  The first two bytes of the string are saved safely in the address of field of the descriptor.  The address previously in the address field will be changed anyway after all the strings are moved up in memory.

Another pass through the string pool moves all active strings as high in memory as they can go, retrieves the first two characters from storage in the descriptor, and points it to the new string location.

Since three bytes are used in the active strings, one and two character strings require different treatment.  On the first pass through the variable space, the characters pointed to by the 'short' descriptors are stored in the length and, if len=2, the low address byte of the descriptor.  The short descriptor is flagged with one or more "FF"'s, since no string can have an address greater than $FF00.

If short strings are found on the first pass, a third pass returns them to the string pool and points the descriptors to their new location.

Short strings do slow collection a little, however, the number of passes is proportional to the number of strings, and not the number squared.

Bongers' program was driven by calls via the &-statement.  Mine differs in that it invoked with the USR function.  Although it is easily converted to an ampersand routine, I wrote it using the USR function to provide fast garbage collection with Hayden's compiler (which also uses string descriptors and a string pool).  The compiler allows USR functions, but makes & difficult.  Another reason is to investigate some uses for USR.

USR(#) converts '#' to a floating point value in the FAC (floating point accumulator) and then jumps via $0A to the address pointed to in $0B, $0C. The results of the machine language subroutine can be returned in the FAC.  The USR function, floating point calls, and addresses are described in Apple's BASIC REFERENCE MANUAL FOR APPLESOFT (Product #A2L0006).

The USR argument for my garbage collector requires a number in the range of +32767 to -32767.  If the number is negative, the string pool is checked for negative ASCII.  If any such characters are found, USR(-1) will return a value of 0, and no garbage collection will be attempted.  If no negative ASCII characters are found, garbage collection will proceed.  In this case USR(-1) returns the number of bytes of free space after collection.

If the USR argument is zero, for example K = USR(0), then collection is  forced and USR will return the amount  of free space.  This is slightly faster than calling with USR(-1), because the preliminary scan for negative ASCII bytes is skipped.  But USR(-1) is safer, if you are not sure.

If you use a positive argument N in the USR function, then no garbage collection will be performed unless there is less than 256*N bytes of free space left.  Whether or not collection is performed, USR will tell you how much free space is left.

Only the lower five bits of the USR argument are tested.  This means that USR(32) is the same as USR(0), USR(33) is the same as USR(1), and so on.

I have shown the program as residing at $9400, but of course you may re-assemble it for any favorite place.

The following Applesoft program makes a lot of garbage, and sees to the collection of it using my garbage collector.  If the call to the USR function in line 245 left out, the program dies for 47 seconds while Applesoft does its own garbage collection.  With the USR call as shown, the delay is less than one second.

<<<<sample here>>>>

<<<<collector listing here>>>>
